Search CORE

234 research outputs found

Influence in Classification via Cooperative Game Theory

Author: Datta Amit
Datta Anupam
Procaccia Ariel D.
Zick Yair
Publication venue
Publication date: 30/04/2015
Field of study

A dataset has been classified by some unknown classifier into two types of points. What were the most important factors in determining the classification outcome? In this work, we employ an axiomatic approach in order to uniquely characterize an influence measure: a function that, given a set of classified points, outputs a value for each feature corresponding to its influence in determining the classification outcome. We show that our influence measure takes on an intuitive form when the unknown classifier is linear. Finally, we employ our influence measure in order to analyze the effects of user profiling on Google's online display advertising.Comment: accepted to IJCAI 201

arXiv.org e-Print Archive

CiteSeerX

A Logical Method for Policy Enforcement over Evolving Audit Logs

Author: Datta Anupam
Garg Deepak
Jia Limin
Publication venue
Publication date: 01/01/2011
Field of study

We present an iterative algorithm for enforcing policies represented in a first-order logic, which can, in particular, express all transmission-related clauses in the HIPAA Privacy Rule. The logic has three features that raise challenges for enforcement --- uninterpreted predicates (used to model subjective concepts in privacy policies), real-time temporal properties, and quantification over infinite domains (such as the set of messages containing personal information). The algorithm operates over audit logs that are inherently incomplete and evolve over time. In each iteration, the algorithm provably checks as much of the policy as possible over the current log and outputs a residual policy that can only be checked when the log is extended with additional information. We prove correctness and termination properties of the algorithm. While these results are developed in a general form, accounting for many different sources of incompleteness in audit logs, we also prove that for the special case of logs that maintain a complete record of all relevant actions, the algorithm effectively enforces all safety and co-safety properties. The algorithm can significantly help automate enforcement of policies derived from the HIPAA Privacy Rule.Comment: Carnegie Mellon University CyLab Technical Report. 51 page

arXiv.org e-Print Archive

CiteSeerX

CISPA – Helmholtz-Zentrum für Informationssicherheit

A Methodology for Information Flow Experiments

Author: Datta Amit
Datta Anupam
Tschantz Michael Carl
Wing Jeannette M.
Publication venue
Publication date: 09/05/2014
Field of study

Information flow analysis has largely ignored the setting where the analyst has neither control over nor a complete model of the analyzed system. We formalize such limited information flow analyses and study an instance of it: detecting the usage of data by websites. We prove that these problems are ones of causal inference. Leveraging this connection, we push beyond traditional information flow analysis to provide a systematic methodology based on experimental science and statistical analysis. Our methodology allows us to systematize prior works in the area viewing them as instances of a general approach. Our systematic study leads to practical advice for improving work on detecting data usage, a previously unformalized area. We illustrate these concepts with a series of experiments collecting data on the use of information by websites, which we statistically analyze

arXiv.org e-Print Archive

CiteSeerX

Crossref

Formal Verification of Differential Privacy for Interactive Systems

Author: Datta Anupam
Kaynar Dilsun
Tschantz Michael Carl
Publication venue
Publication date: 14/01/2011
Field of study

Differential privacy is a promising approach to privacy preserving data analysis with a well-developed theory for functions. Despite recent work on implementing systems that aim to provide differential privacy, the problem of formally verifying that these systems have differential privacy has not been adequately addressed. This paper presents the first results towards automated verification of source code for differentially private interactive systems. We develop a formal probabilistic automaton model of differential privacy for systems by adapting prior work on differential privacy for functions. The main technical result of the paper is a sound proof technique based on a form of probabilistic bisimulation relation for proving that a system modeled as a probabilistic automaton satisfies differential privacy. The novelty lies in the way we track quantitative privacy leakage bounds using a relation family instead of a single relation. We illustrate the proof technique on a representative automaton motivated by PINQ, an implemented system that is intended to provide differential privacy. To make our proof technique easier to apply to realistic systems, we prove a form of refinement theorem and apply it to show that a refinement of the abstract PINQ automaton also satisfies our differential privacy definition. Finally, we begin the process of automating our proof technique by providing an algorithm for mechanically checking a restricted class of relations from the proof technique.Comment: 65 pages with 1 figur

arXiv.org e-Print Archive

Differentially Private Data Analysis of Social Networks via Restricted Sensitivity

Author: Blocki Jeremiah
Blum Avrim
Datta Anupam
Sheffet Or
Publication venue
Publication date: 01/01/2013
Field of study

We introduce the notion of restricted sensitivity as an alternative to global and smooth sensitivity to improve accuracy in differentially private data analysis. The definition of restricted sensitivity is similar to that of global sensitivity except that instead of quantifying over all possible datasets, we take advantage of any beliefs about the dataset that a querier may have, to quantify over a restricted class of datasets. Specifically, given a query f and a hypothesis H about the structure of a dataset D, we show generically how to transform f into a new query f_H whose global sensitivity (over all datasets including those that do not satisfy H) matches the restricted sensitivity of the query f. Moreover, if the belief of the querier is correct (i.e., D is in H) then f_H(D) = f(D). If the belief is incorrect, then f_H(D) may be inaccurate. We demonstrate the usefulness of this notion by considering the task of answering queries regarding social-networks, which we model as a combination of a graph and a labeling of its vertices. In particular, while our generic procedure is computationally inefficient, for the specific definition of H as graphs of bounded degree, we exhibit efficient ways of constructing f_H using different projection-based techniques. We then analyze two important query classes: subgraph counting queries (e.g., number of triangles) and local profile queries (e.g., number of people who know a spy and a computer-scientist who know each other). We demonstrate that the restricted sensitivity of such queries can be significantly lower than their smooth sensitivity. Thus, using restricted sensitivity we can maintain privacy whether or not D is in H, while providing more accurate results in the event that H holds true

arXiv.org e-Print Archive

CiteSeerX

Towards Human Computable Passwords

Author: Blocki Jeremiah
Blum Manuel
Datta Anupam
Vempala Santosh
Publication venue
Publication date: 09/09/2016
Field of study

An interesting challenge for the cryptography community is to design authentication protocols that are so simple that a human can execute them without relying on a fully trusted computer. We propose several candidate authentication protocols for a setting in which the human user can only receive assistance from a semi-trusted computer --- a computer that stores information and performs computations correctly but does not provide confidentiality. Our schemes use a semi-trusted computer to store and display public challenges

C_i\in[n]^k

. The human user memorizes a random secret mapping

\sigma:[n]\rightarrow\mathbb{Z}_d

and authenticates by computing responses

f(\sigma(C_i))

to a sequence of public challenges where

f:\mathbb{Z}_d^k\rightarrow\mathbb{Z}_d

is a function that is easy for the human to evaluate. We prove that any statistical adversary needs to sample

m=\tilde{\Omega}(n^{s(f)})

challenge-response pairs to recover

\sigma

, for a security parameter

s(f)

that depends on two key properties of

f

. To obtain our results, we apply the general hypercontractivity theorem to lower bound the statistical dimension of the distribution over challenge-response pairs induced by

f

and

\sigma

. Our lower bounds apply to arbitrary functions

f

(not just to functions that are easy for a human to evaluate), and generalize recent results of Feldman et al. As an application, we propose a family of human computable password functions

f_{k_1,k_2}

in which the user needs to perform

2k_1+2k_2+1

primitive operations (e.g., adding two digits or remembering

\sigma(i)

), and we show that

s(f) = \min\{k_1+1, (k_2+1)/2\}

. For these schemes, we prove that forging passwords is equivalent to recovering the secret mapping. Thus, our human computable password schemes can maintain strong security guarantees even after an adversary has observed the user login to many different accounts.Comment: Fixed bug in definition of Q^{f,j} and modified proofs accordingl

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server